Skip to content

Conversation

@845473182
Copy link
Contributor

@845473182 845473182 commented Nov 17, 2025

What this PR does / why we need it?

Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB to support list-type parameters
This PR also modify the logic of loading model in dynamic-eplb scenario.
The operator is based on this pr: #3804

Does this PR introduce any user-facing change?

no

How was this patch tested?

vllm serve /home/weight/DeepSeek-V3.1_w8a8mix_mtp \
    --max_num_seqs 8 \
    --max-model-len 8192 \
    --max-num-batched-tokens 16384 \
    --tensor-parallel-size 8 \
    --data-parallel-size 2 \
    --enable-expert-parallel \
    --served-model-name ds_r1 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --no-enable-prefix-caching \
    --port 8999 \
    --quantization "ascend" \
    --gpu-memory-utilization 0.85 \
    --trust-remote-code \
    --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}' \
    --additional-config='{"dynamic_eplb":true, "num_iterations_eplb_update":100, "num_wait_worker_iterations":100}'
 

input&output: 2k 2k
This PR:
fusion

Baseline:
baseline

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@845473182 845473182 changed the title Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB [EPLB][Ops] Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list operator into dynamic EPLB Nov 28, 2025
@845473182 845473182 marked this pull request as ready for review November 28, 2025 07:48
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Nov 29, 2025
白永斌 added 19 commits November 30, 2025 00:13
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
白永斌 added 7 commits November 30, 2025 14:02
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
Signed-off-by: 白永斌 <[email protected]>
Signed-off-by: 欧派果奶我还要 <[email protected]>
@845473182 845473182 force-pushed the gmm_swiglu_quant_tensor_list branch from fef26ce to 1777304 Compare November 30, 2025 06:05
Signed-off-by: 欧派果奶我还要 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants